Co-occurrence probabilities between mosquito vectors of West Nile and Eastern equine encephalitis viruses using Markov Random Fields (MRFcov)

Mosquito vectors of eastern equine encephalitis virus (EEEV) and West Nile virus (WNV) in the USA reside within broad multi-species assemblages that vary in spatial and temporal composition, relative abundances and vector competence. These variations impact the risk of pathogen transmission and the operational management of these species by local public health vector control districts. However, most models of mosquito vector dynamics focus on single species and do not account for co-occurrence probabilities between mosquito species pairs across environmental gradients. In this investigation, we use for the first time conditional Markov Random Fields (CRF) to evaluate spatial co-occurrence patterns between host-seeking mosquito vectors of EEEV and WNV around sampling sites in Manatee County, Florida. Specifically, we aimed to: (i) quantify correlations between mosquito vector species and other mosquito species; (ii) quantify correlations between mosquito vectors and landscape and climate variables; and (iii) investigate whether the strength of correlations between species pairs are conditional on landscape or climate variables. We hypothesized that either mosquito species pairs co-occur in patterns driven by the landscape and/or climate variables, or these vector species pairs are unconditionally dependent on each other regardless of the environmental variables. Our results indicated that landscape and bioclimatic covariates did not substantially improve the overall model performance and that the log abundances of the majority of WNV and EEEV vector species were positively dependent on other vector and non-vector mosquito species, unconditionally. Only five individual mosquito vectors were weakly dependent on environmental variables with one exception, Culiseta melanura, the primary vector for EEEV, which showed a strong correlation with woody wetland, precipitation seasonality and average temperature of driest quarter. Our analyses showed that majority of the studied mosquito species’ abundance and distribution are insignificantly better predicted by the biotic correlations than by environmental variables. Additionally, these mosquito vector species may be habitat generalists, as indicated by the unconditional correlation matrices between species pairs, which could have confounded our analysis, but also indicated that the approach could be operationalized to leverage species co-occurrences as indicators of vector abundances in unsampled areas, or under scenarios where environmental variables are not informative. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-022-05530-1.

Keywords Interspecies interaction, Spatial distribution, Conditional Markov Random Fields, Host-seeking mosquito, Community ecology

Background
Mosquito vector-borne pathogens such as eastern equine encephalitis virus (EEEV) and West Nile virus (WNV) are maintained and proliferate in the natural environment via a complex set of requirements and interactions with their hosts, underlying environmental variables and interactions with other organisms [1]. Both of these disease pathogens are autochthonously transmitted in the USA since their introduction in the 1990s and pose a continuous and consistent threat, particularly where multiple mosquito vectors are found. Although the inherent complexity of EEEV and WNV transmission systems is recognized broadly within community ecology [2], theoretical frameworks specific to disease ecology focus primarily on the biodiversity of vertebrate hosts and place much less emphasis on the biodiversity of insect vector species when considering arbovirus transmission dynamics [3][4][5]. However, mosquito vectors of these two pathogens reside within broader multi-species assemblages that vary in composition, abundances and vector competency to transmit EEEV and WNV, which can collectively impact pathogen transmission in a geographic area [6][7][8][9][10]. In Florida, the EEEV and WNV risk is a composite distributed across multiple species of the competent bridge and main mosquito vectors that may be sympatric but vary in abundance, diversity and vector competence [11][12][13][14][15]. More than 60 mosquito control districts (MCDs) conduct routine surveillance for these vector species to guide specific vector control efforts in Florida [16]. Unfortunately, vector control capabilities are limited and need to be carefully targeted toward the spatial and temporal distribution of mosquito vectors to be effective. Importantly, vector control techniques vary depending on the target mosquito species, which adds to the complexity of designing effective control programs against mosquito vectors of EEEV and WNV.
While range-wide distributions of single species are now routinely estimated using ecological niche models [17,18], the distribution and abundance of species at local scales are likely to involve both meso-and microscale landscape features and the potential for interactions with other species that form the mosquito community. This complexity requires a different set of underlying data and analytical toolkits that can estimate both landscape factors and factors that promote or impede community co-occurrence. Due to challenges from both the data and analytical sides, work examining these factors in a single framework has remained piecemeal at best, with studies often focusing on just a subset of species and their possible interactions [17,19,20] or on larval distributions [21], not those of adults. However, virus transmission occurs in the adult life-stage of mosquitoes, thus investigating co-occurrence patterns of adult mosquitoes is essential for understanding transmission risk across geographic areas. Further, adult mosquito trapping data are collected routinely by MCDs, providing a means to scale-up analysis broadly.
A key challenge in predicting local-scale species distributions and community composition is accounting for covariance between species and the environment. Generalized additive models (GAMS) have been used to explore relationships between abundances of potential competitors in mosquito assemblages and a vegetation gradient [22], while other methods have focused on pairwise probability calculations between individual species [23][24][25], with some similarities across methods [26]. When considering landscape-scale co-occurrence, a particularly powerful and yet unused approach for mosquitoes is to first quantify correlations between species pairs and then determine whether the strength of these correlations is conditional on environmental variables using a conditional Markov Random Fields (CRF) analysis [25]. This approach simultaneously considers both biotic and abiotic factors that may be controlling the shape of species abundances, distributions and community composition across environmental gradients in space.
In this study, we leverage the capability of longitudinal collection data for mosquito communities, with the main emphasis on vectors of EEEV and WNV from Manatee County, Florida over the 2020 sampling season (May-December) to: (i) quantify correlations between host-seeking mosquito vector species of WNV and EEEV and other mosquito species, vectors and non-vectors; (ii) quantify correlations between host-seeking mosquito vectors and landscape and climate variables within their flying ranges; and (iii) investigate whether the strength of correlations between species pairs are conditional on landscape or climate variables using CRF analyses.
We hypothesized that species composition and abundances of WNV-and EEEV-competent mosquito species are most likely determined by co-occurrences between species pairs in specific landscape and/or climate features; that is, landscape features generally important for modifying species co-occurrence. Alternately, it may be that most vector mosquitoes are habitat generalists and generally co-occur regardless of landscape. The end goal of using this approach is to better understand the joint effects of landscape and other mosquito species drivers on mosquito diversity/density and provide data-driven information for more comprehensive management and control strategies.

Study area and mosquito collections
Georeferenced 2020 mosquito trap data collected by Manatee County Mosquito Control District (MCMCD), Florida, were acquired from the VectorBase Bioinformatics Resource for Invertebrate Vectors of Human Pathogens repository (https:// vecto rbase. org; 2021). The MCMCD 2020 data resulted from collections using US Centers for Disease Control and Prevention (CDC) CO 2 -baited light traps set at 56 locations at weekly intervals from approximately May to December (Fig. 1). Although some trap and attractant biases exist, CDC CO 2 -baited light traps collect diverse mosquito species in Florida [27]. This was demonstrated by the mosquito species that were consistently collected in the 2020 MCMCD data set representing flood water, salt marsh and container-inhabiting mosquito communities. Light traps were set for approximately 12 h before sunset until dawn, and mosquito collections were identified to species by trained mosquito control personnel using the Darsie and Ward (2005) taxonomic key [28]. Species counts for each sampling week were recorded and formatted in Microsoft Excel (Microsoft Corp., Redmond, WA, USA) spreadsheets prior to submission to the VectorBase platform [29]. The mean number of mosquitoes per trap night per species was calculated at each trap site across 28 weeks during the 2020 sampling season, and a 'site-by-species' matrix was created with individual trap locations occupying rows and individual species occupying columns in preparation for analyses (Additional file 1: Data file S1).

Vector-competent mosquito species
Laboratory-confirmed vector competency of mosquito vector species for EEEV and WNV were identified from the scientific literature (Table 1), based on the collected mosquito species from the MCMCD. The field-confirmed mosquito vectors of EEEV and WNV were also identified from previous studies and denoted as putative vector species in our study. We only included 17 WNV-and EEEV-competent vector species in our results and discussion. Other non-vector mosquito species are included in Additional file 2: Table S1.

US Geological Survey (USGS) Conterminous United
States Land Cover Projections 1992-2100 were extracted for 2020 [47] and served as land cover data in our analyses. These data have a 250-m spatial resolution and consist of annual land cover classifications. We focused on four major land cover classifications found in Manatee County as representative of different levels of anthropogenic disturbance across the study area: developed, cropland, woody wetland and herbaceous wetland [48]. We quantified and extracted area percentages of each land cover type within both 5-km and 10-km buffers surrounding each mosquito trap location using the 'landscape metrics' package in R [49,50].
Bioclimatic variables within buffer sets surrounding each trap site from 2020 daily 'Parameter-elevation Regressions on Independent Slopes Model' (PRISM) Climate Group data [51] were extracted at an 800-m spatial resolution using the 'dismo' package in R [52]. PRISM data were accessed from https:// prism. orego nstate. edu/.
To reduce the number of variables in our model, five bioclimatic variables were selected for analyses: Bio2 (mean diurnal temperature range), Bio5 (the maximum temperature of the warmest month), Bio9 (mean temperature of the driest quarter), Bio15 (precipitation seasonality) and Bio17 (precipitation of the driest quarter), based on mosquito biology and ecological data reported in previous studies [18,21,53]. Bioclimatic variables were all scaled to range between 0 and 1 in preparation for modeling given the widely different units in the raw data.

Statistical analyses
We used CRFs executed in the 'MRFcov' (Markov Random Fields with additional covariates)package in R [25] to quantify whether the abundances of each WNV and EEEV mosquito vector species were (i) unconditionally dependent on another mosquito species in the assemblage and (ii) unconditionally dependent on a landscape or bioclimatic variable; and whether (iii) the strength level of dependence between species pairs was conditional on a landscape and/or bioclimatic variable or (iv) there were no correlations between species pairs nor between individual species and environmental variables. [25]. Accordingly, the unconditional correlations were estimated using the generated regression correlation matrices generated by the MRF and CRF analyses which refers to consistent correlation between either species pairs, or between individual vector species and environmental variables, with and without covariates using MRF and CRF analyses. Unlikely conditional correlations are: (i) correlation between species pairs only at specific habitats or climate which did not show unconditional correlations; (ii) correlations between single species and landscape and/or climate variables; or (iii) increased correlation strength between species pairs at specific habitats or climate in addition to their unconditional correlations, such as the correlation between Coquillettidia perturbans and Mansonia dyarii indicated by the generated regression correlation matrices (Tables 1, 3).
To prepare data for analysis, we rounded the mean mosquito per trap night value within our 'site-by-species' matrix to an integer value to serve as nodes (mosquito species) and added additional columns with percentage landscape and average bioclimatic variables to serve as the conditional variables in the model. To calculate CRF using abundance data, the 'MRFcov' package log-transforms species counts before performing pairwise linear regressions across all combinations of species and environmental variables, using an optimized regularization multiplier for variable selection and to reduce overfitting; predicted and observed values for all species combinations are then used in model evaluation [25]. Geographic coordinates at each mosquito trap location were included to fit a spatial spline to account for residual spatial autocorrelation that can inflate parameter estimates resulting in Type I errors [54]. Bootstrap spatial models analyses using 500 replicates and 100% of sampling points with random replacement in each replicate were used to capture uncertainty in parameter estimates [25], and key regression coefficients of each species were output to a single table showing the relative importance of each variable with a threshold value of > 0.01 and mean coefficient values (Additional file 1: Data file S1). The relative importance values indicate the relative strength of a variable on the log abundance of a vector species out of all variable combinations calculated for the species, while the sign of the mean coefficient values shows the direction of these correlations. Two separate models were run within each of the 5-km and 10-km buffer distances from each trap location: one model with and one model without environmental variables.

Results
Manatee County, Florida, is located on the western coast of the Florida Peninsula on the Gulf of Mexico (Fig. 1). Along the coast, the area is primarily covered by developed land, while the inland extent of the county is predominantly rural and consists of agricultural land interspersed with wetlands [47]. The region is characterized by a humid subtropical climate [55] [57], with most of it falling during the rainy season, which typically lasts from May to October.
A total of 2,009,985 adult female mosquitoes representing 30 species and eight genera were collected across 56 trap sites during the 2020 mosquito trap surveillance sampling from May to December. Initial exploration of mosquito abundances by genera across the trap sites indicated variability in the abundance of mosquito genera across these sites. Culex was the dominant genus found across these sites; however, approximately 14% of sites were dominated by Aedes mosquitoes (Fig. 2).

Statistical tests
Box plots for bootstrapped models with no covariates (MRF) and with covariates (CRF) measured within 5-km and 10-km buffer distances indicated that the inclusion of landscape and bioclimatic variables did not substantially improve the overall model performance when evaluated by deviance (DV) or mean squared error (MSE) values ( Fig. 3; Additional file 1: Figure S1). The insignificant differences in DV and MSE values reflect that the change in number and strength of correlations between species pairs, as indicated by relative importance values, did not change the overall spatial correlations between mosquito species pairs. This insignificant small change in correlations between species pairs without (Fig. 4) and with environmental variables (Fig. 4), as demonstrated in network plots, showed that the number and direction of correlations between individual species pairs were slightly impacted by the environmental variables included in the model. The plot shown in Fig. 4b demonstrates correlations between species pairs that were not affected by the inclusion of environmental variables and correlations between species pairs where the strength of correlation varies with the inclusion of an environmental variable. The reduced number of species pairs in the plot shown in Fig. 4b compared to that shown in Fig. 4a indicated that several species pairs were highly correlated when no environmental variables were included in the model but that these correlations reduced to zero when environmental variables were added.
Model regression correlation matrices included key coefficient tables that summarized the relative importance of correlations between log abundances of species pairs or between the log abundance of an individual vector species and environmental variables (i.e. unconditional correlations with another species measured within 5-km and 10-km buffer distances), or conditional correlations between species pairs where the strength of the correlation changes in specific land cover classes or climate conditions measured within 5-km and 10 km-buffer distances (Tables 1, 3: 5-km buffer; Additional file 2: Table S1; Additional file 3: Table S2 for all species within 5 km and 10 km). The corresponding mean coefficient values derived across the 500 bootstrapped model replicates, using random replacement in each replicate, with 5% and 95% quantiles provide a measure of uncertainty.
Key coefficient values for all species combinations with relative importance values > 0.01 are available in Additional file 2: Table S1; Additional file 3: Table S2.
Overall, the regression coefficients shown in Tables 1-3 indicated that a greater number of vector species were unconditionally correlated with another mosquito vector species than with the environment; however, some species did exhibit correlations with landscape and climate variables, and several species pairs were no longer correlated with one another when environmental variables were added. The log abundances for 16 of the 20 WNV and EEEV vector species investigated were unconditionally correlated with another mosquito vector or non-vector species (Table 1). Also, five vector species demonstrated conditional dependence on environmental variables ( Table 2). Out of these five vectors, three WNV vector species, one EEEV vector species and one vector species of both WNV and EEEV were conditionally dependent on three climates (Bio2, Bio9, Bio15) and two landscape variables (developed and woody wetland) measured within a 5-km buffer distance (Table 2).   For example, Culiseta melanura, the primary vector for EEEV, only showed a strong conditional correlation with precipitation seasonality (Bio15; relative importance = 0.446), average temperature of driest quarter (Bio 9; relative importance = 0.303) and woody wetland (relative importance = 0.226), and not with other mosquito species (Table 2). Results from models including environmental variables measured within 10-km buffer distances indicated that five vector species showed conditional correlations with four climate variables (Bio2, Bio5, Bio9, Bio17) and one landscape (developed) variable (Additional file 3: Table S2). We found limited evidence for conditional correlations between species pairs where the strength of correlations between these pairs changes in a specific landscape or climate variable, in both 5-km (Table 3) and 10-km buffer distances (Additional file 3: Table S2). Models run within a 5-km buffer distance indicated this category of conditional correlations between only three species pairs in specific habitats (cropland, woody wetland, herbaceous wetland) ( Table 3) where at least one species in the species pairs is a vector. The relative importance of environmental variables on conditional correlations increased within the 10-km buffer distance, with nine species pairs demonstrating conditional correlations varying with landscape and climate variables. While climate variables did not have any effect on the strength of conditional correlation between species pairs within 5 km (Table 3), climate variables impacted the strength of conditional correlations between seven species pairs within 10-km buffer radii (Additional file 3: Table S2).

Discussion
The diversity of host-seeking mosquito vectors with different feeding preferences and their spatial and temporal co-occurrences have been highlighted in previous studies to play an important role in the circulation, maintenance and transmission of disease pathogens in mosquito populations [6,8,10]. In this study, we investigated abundances of known and putative WNV and EEEV vector species using a community ecology approach that quantified correlations with other vector and non-vector mosquito species, as well as with landscape and climate variables, and then determined if and how the strength of correlations between species pairs change across environments. The result is a novel view of mosquito vector occurrence in the context of abiotic and community factors and highlights the potential to use species co-occurrences as indicators of vector abundances in the absence of direct observations, or under scenarios where environmental variables are not informative.
Based on previous empirical observations linking mosquito vector abundances with environmental variables [59,60], we expected to find that log abundances of vector species would be strongly correlated with the landscape and climate variables. Surprisingly, our results indicated that the log abundances of 10 out of 13 WNV vector species, three out of four EEEV vector species and three vector species for both WNV and EEEV were positively correlated with other mosquito species, but only weakly correlated or not correlated at all with environmental variables. We only found three cases of negative correlation between species: Aedes aegypti and Mansonia titillans (mean coefficient value = − 0.044), Aedes infirmatus and Aedes fulvus pallens (mean coefficient value = − 0.059), and Aedes taeniorhynchus and Aedes atlanticus (mean coefficient value = − 0.059), indicating low log abundances of the former species of each pair at collection sites were associated with high abundances of the respective latter species (Table 1).
A challenge with interpreting co-occurrence results is how to link those to the underlying mechanisms. Cooccurrence can provide a basis for more detailed studies attempting to demonstrate direct biotic interactions. It may also be that co-occurrence instead reflects differential micro-scale habitats not fully captured in the abiotic variables used. In the example above of a negative cooccurrence of Ae. aegypti and Ma. titillans, Ae. aegypti prefers water containers in urban areas [20,61,62], whereas Ma. titillans requires more permanent freshwater with emergent aquatic vegetation [53,63]. Given that our models were estimated for host-seeking mosquitoes, within their flight ranges, and do not fully capture the microscale habitat preferences, the negative co-occurrence may simply be due to this microscale patterning, rather than, for example, direct competition.
The correlation of WNV and EEEV vectors with other species and less with environmental variables, as shown in our results, may indicate: (i) geographic overlapping, due to small study area, in the flight ranges between the studied mosquito species, and/or (ii) that some of these vectors are typically broad-habitat generalists, which can present challenges when investigating occurrence patterns using environmental variables alone. For example, the strong co-occurrence between Culex restuans and Culex quinquefasciatus and moderate co-occurrence between Aedes vexans and Ae. infirmatus may again reflect broad occurrence across landscape types. In addition, the potential for unmeasured covariance between spatial and temporal niche dynamics, especially given these taxa are known to be tied to the dynamics of wet season timing in North and Central Florida, may contribute to observed patterns [53,64].
Our modeling approach does clearly delineate some broad-scale habitat specialists. For example, Cs. melanura, the primary enzootic vector of EEEV, was strongly correlated with landscape and bioclimatic variables, but not with other mosquito species. Compared to the more generalist vectors in our study area, Cs. melanura is a known specialist species with a strong preference for hardwood swamps as breeding habitats [65], and our results are consistent with those of previous studies that associated this species with woody wetland [66][67][68]. Another species, Ae. taeniorhynchus, was most abundant in areas with low mean diurnal temperature ranges, which almost certainly reflects its coastal affinities [69][70][71], where residual heating or cooling from ocean temperatures reduces onshore fluctuations in temperatures and increases water salinity.
Variation in the strength and direction of dependence between pairs of vector species across different environmental variables was of particular interest in terms of the goal of providing more comprehensive information about habitats in which multiple vector species may occur. Only one vector species, Cq. perturbans, demonstrated such conditional correlations. The shifts in the strength of mean coefficient values between Cq. perturbans and Ma. dyarii was affected by cropland habitats. The positive correlation between Cq. perturbans and Ma. dyari in croplands specifically, including wooded areas, reflects the importance of these habitats in predicting the level and strength of correlations between the two species compared to their unconditional correlations with each other in other habitats [72][73][74][75]. This conditional correlation brings home that climatic and land-use changes may differentially shift risks for different disease vector abundances such as, for example, shifts in dry quarter precipitation differentially favoring Culex. nigripalpus (a competent WNV vector) at the expense of Ae. taeniorhynchus (a less competent WNV vector).
We observed insignificant differences between model performance with and without environmental variables for co-occurrences between species pairs. We also observed only slight variation in model results when comparing effects of environmental variables measured within 5-km buffer distances and within 10-km buffer distances, as indicated by the relative importance values. However, increasing our buffer radius from 5 to 10 km resulted in a slight increase in the number of vector species demonstrating correlations with another mosquito species only, and not with environmental variables, from 32 to 34 pairs. Increased buffer extents capture greater mosquito communities and potential variability in climate and landscape conditions, which may be only marginally variable across smaller geographic areas such as Manatee County. Considerations of scale in the use of such approaches are particularly important to consider, especially given our discussion above regarding complexities with interpreting co-occurrence (or coabundance) in relation to (here unmeasured) microhabitat drivers.
Although the collected mosquito vector diversity in the current study was robust, additional longitudinal data of mosquito collections, which could include other sampling techniques such as ovitraps, are needed to capture intra-and inter-annual population fluctuations between species pairs and to investigate additional environmental covariates at different resolutions across space and time. Moreover, the conditional correlations between host-seeking disease-vector species and other species not involved in the transmission of pathogens in specific habitats and climate conditions need further investigation to identify variation in both intra-and inter-seasonal correlations using a robust data collection across time and space. Additionally, because our study focused on hostseeking female mosquitoes, further investigation into the contribution of mosquito flight distances and their contributions to observed patterns is warranted. The purpose of this study was not to dissect the underlying processes and mechanisms that determine community abundances across our study area; however, our results highlight points of interest for continued investigation in the context of understanding underlying transmission risk. Specifically, continued investigation into the contributions of competition/exclusion in mature and immature habitats and the role of such biotic interactions in the distribution of vector mosquitoes will be critical.

Conclusion
The landscape and bioclimatic covariates did not substantially improve the overall model performance, and the majority of WNV and EEEV vector species were positively correlated with other vector and non-vector mosquito species. This may reflect: (i) the small geographic size of the study area with less environmental heterogeneity and that distances between habitats are within the foraging range of most of mosquito species; (ii) that mosquito abundance and distribution in our study sites are predicted by the biotic factors (here unmeasured) in the water habitats, such as abundance of other mosquito species, and not climate; and/or (iii) that the mosquito community in Manatee County is habitat generalist, according to literature from similar studies. Only one exception, Culiseta melanura, the primary vector for EEEV, showed a strong conditional correlation with woody wetland, precipitation seasonality and average temperature of driest quarter, but not any other species. Some of the studied mosquito vector species are habitat generalists, indicated by a low number of conditional correlations with environmental variables but which also indicated that the approach could be operationalized to leverage species co-occurrences as indicators of vector abundances in unsampled areas, or under scenarios where environmental variables are not informative. Also, considerations of geographic scale in the use of CRF approach are particularly important to be addressed in future studies to explain the complexities of co-occurrence (or co-abundance) in relation to microhabitat drivers.
Additional file 2: Table S1. Key coefficient values derived from 500 bootstrap replicates for models run with environmental covariates measured within a 5-km distance from trap locations. SpName is the mosquito species response variable, Variable is the explanatory variable and rows with two variables separated by an "_" indicate conditional dependence. Rel_importance represents the relative importance of the Variable on the log abundance of the SpName count out of all combinations of variables for the individual SpName. Lower is the lower 10% confidence level, Mean_coef is the mean coefficient value and Upper is the upper 90% confidence value. Rows with NA values indicate that the log abundances SpName did not demonstrate dependence on another variable (species or environmental variable) included in the data set.
Additional file 3: Table S2. Key coefficient values derived from 500 bootstrap replicates for models run with environmental covariates measured within a 10-km distance from trap locations. SpName is the mosquito species response variable, Variable is the explanatory variable and rows with two variables separated by an "_" indicate conditional dependence. Rel_importance represents the relative importance of the Variable on the log abundance of the SpName count out of all combinations of variables for the individual SpName. Lower is the lower 10% confidence level, Mean_coef is the mean coefficient value and Upper is the upper 90% confidence value. Rows with NA values indicate that the log abundances SpName did not demonstrate dependence on another variable (species or environmental variable) included in the data set.